VSTI - Prajna Project

VAST 2008 Challenge
Mini Challenge 1: Wiki Editors

Authors and Affiliations:

Edward Swing, Vision Systems & Technology, Inc.

Student Team: NO

Tool(s):

The Prajna Project is a Java toolkit designed to provide various capabilities for visualization, knowledge representation, geographic displays, semantic reasoning, and data fusion. Rather than attempt to recreate the significant capabilities provided in other tools, Prajna instead provides software bridges to incorporate other toolkits where appropriate. Prajna will be released to the Open Source community in the near future.

For this challenge, I developed a custom application using the Prajna Project. I created a Wiki-Edit reader to load the wikipedia edits and parse information from the records. This reader identified vandals, reversions, and parsed the topics from the comments when available.

I designed custom graph arrangements and user interface components to analyze this particular data set. This analysis required custom graph arrangements because of the meaning of the graph data, and because these graphs frequently became disjoint when certain features were removed. Prajna includes graph display capabilities with extensions for both arrangements and visual representations, which I used for the custom displays.

The reader and graph displays were tested and verified using other contentious wiki edit histories (such as Scientology). The visualization displays represented the edit history of these pages similarly, demonstrating that the tool can be used with real data. VSTI is evaluating this tool for use in various analysis programs for the intelligence community.

The Prajna Project is a toolkit developed by Edward Swing. The custom application was built at VSTI. Other VSTI programs have since incorporated some of the new components that were developed for this contest.

Two Page Summary: YES

Prajna_Wiki.pdf

ANSWERS:

Wiki-1: What are the factions represented in the edit pages and who are its members? In other words, describe the groups and their members based on their editing changes.

GROUPMembers
Paraiso SupportersAmado, Estirabot, Savanna, Socorro, VictoriaV
Paraiso Opponents66.66.125.x, Agustin, DailosTamanca, DavidMoron, Ricarda, Rm99
Neutral EditorsAdriano, Edemir, Sara
Automated ProcessesBakBOT
VandalsAlejo, many others

Only editors with a significant number of edits were included

Detailed Answer:

Video

To identify the factions, we designed two different capabilities within the application. First, we created a social graph using those edits which reverted a previous edit. This usually signified either a vandalism reversion, or a contentious discussion. This process created a number of disjoint networks which we could examine in detail. We added color coding to identify vandals (red), normal editors (blue), and contentious editors (purple). We defined a contentious editor as one who had some edits that were identified as vandalism by a member of an opposing faction, but those edits made up less than 50% of their total edits. The reversion types were identified as either a simple reversion, a Good Faith reversion (or correction), or an Undoing of a Reversion. Each of these reversion types could be included or excluded from the display.


Clusters of Discussion. Each cell can be expanded to view, and interact with, the particular discussions.


Particularly active discussion group, including many editors from different factions.

Designing the utility to parse the various comments proved difficult. Because of the various slang terms, shorthand, and misspelled words, desiging a completely robust utility would be impossible, particularly when the tool was being designed for use on other wikipedia pages too. However, by parsing common phrases, particularly for reversions, we were able to glean a reasonable amount of information. To correctly identify topics, we matched the various topic names to each other using a word-distance algorithm. This removed simple typing errors or differences in capitalization.

We also added the capability to filter out the editors based upon a minimum number of edits. This allowed us to ignore those editors who contributed little to the actual discussion. It also filtered out the vandals, who typically only vandalized the wiki page once or twice. Using this feature, we could see the major contributors to the article.


Contributors in a discussion with at least 6 edits. The Home Health Care topic is highlighted, along with topic contributors Edemir, Sara, and RyogaNica.

In order to visualize different factions, we designed a custom graph arrangement which used the edges (reversions) to separate the nodes. This arrangement separated the nodes in a particular subgraph into groups of individuals based upon who they were connected to. This separation attempted to separate the editors who argued with each other into separate groups. Editors who had similar opponents could be grouped together, or left as separate entities.


Editors grouped by reversions. This arrangement shows various individuals with a common viewpoint.

In order to analyze discussions about a particular topic, we added the capability to view the comments relating to that topic, and highlight editors who contributed to that topic. We also added a feature to highlight editors who contributed to a specific article section. This allowed us to identify those editors who were the focal point of certain topics in the graph display. By showing all comments associated with a particular topic, we can see the various editors, their points of view, and activity levels. We were able to discern topic leaders as well.


Reviewing comments made by RyogaNica in the Comment Review pane.

This information, combined with the editor's comments in the discussion page for the Paraiso Manifesto, allowed us to identify certain key personalities and factions in the ongoing discussions.


Wiki-2: Is the Paraiso movement involved in violent activities?

YES

List of wiki edits providing evidence

Short Answer:

More properly, there is reasonable cause for suspicion, though no solid proof.

The wiki slang and terseness of the comments prevented any automated reasoning process. Therefore, I reviewed the comments manually, using the discussion viewer for various topics. Certain topics, such as Controversy and Criticism, provided insight into various Paraiso activities.

References to other countries show that Paraiso was not welcome there. Mexico banned the movement, and Belgium prosecuted members of the movement for their activities. In addition, Paraiso supporters removed references to activities in Spain and Canada. This suggests controversial activity.


Popup display window which lists the comments for the Controversy and Criticism topic.

In addition, Edemir, a generally neutral editor, referred to a confrontation between Paraiso and the Dept. of Health. Shortly afterward, Alphanzo vandalized the page with assertions about violent activity. While this was vandalism, the temporal coincidence is suspect.

We also note a pattern of revisions where Paraiso supporters attempt to remove negative information about the sect.

References to splinter groups raise the possibility that the central Paraiso movement is largely peaceful, with violent splinter groups.